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Abstract 

Complete mitochondrial genomes have been shown to be reliable markers for phylogeny reconstruction among diverse 
animal groups. However, the relative difficulty and high cost associated with obtaining de novo full mitogenomes have 
frequently led to conspicuously low taxon sampling in ensuing studies. Here, we report the successful use of an eco- 
nomical and accessible method for assembling complete or near-complete mitogenomes through shot-gun next-gener- 
ation sequencing of a single library made from pooled total DNA extracts of numerous target species. To avoid the use of 
separate indexed libraries for each specimen, and an associated increase in cost, we incorporate standard polymerase 
chain reaction-based "bait" sequences to identify the assembled mitogenomes. The method was applied to study the 
higher level phylogenetic relationships in the weevils (Coleoptera: Curculionoidea), producing 92 newly assembled 
mitogenomes obtained in a single lllumina MiSeq run. The analysis supported a separate origin of wood-boring behavior 
by the subfamilies Scolytinae, Platypodinae, and Cossoninae. This finding contradicts morphological hypotheses propos- 
ing a close relationship between the first two of these but is congruent with previous molecular studies, reinforcing the 
utility of mitogenomes in phylogeny reconstruction. Our methodology provides a technically simple procedure for 
generating densely sampled trees from whole mitogenomes and is widely applicable to groups of animals for which 
bait sequences are the only required prior genome knowledge. 
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Introduction 

With the advent of high-throughput next-generation se- 
quencing (NGS) technologies and their ability to generate 
large amounts of data suitable for genomic assembly, system- 
atists are increasingly adopting such methods to reconstruct 
complete mitochondrial genomes (mitogenomes) to infer 
phylogenies across a diverse range of taxa. Such research 
has provided compelling insights in studies ranging from 
the investigation of deep-level metazoan relationships 
(Osigus et al. 2013) to those within single phyla (e.g., 
Cnidaria; Kayal et al. 2013), orders (e.g., Primates; 
Finstermeier et al. 2013), families (e.g., Braconidae wasps; 
Wei et al. 2010), and genera (e.g., Architeuthis giant squid; 
Winkelmann et al. 2013). Mitogenomes have an intrinsic suit- 
ability for phylogenetic analysis due to their unambiguous 
orthology (Botero-Castro et al. 2013), phylogenetic signal at 
diverse taxonomic ranks (Bernt et al. 2013), broadly uniform 



rate of molecular evolution (Papadopoulou et al. 2010), and 
uniparental inheritance consistent with bifurcating phyloge- 
netic trees (Curole and Kocher 1999), even if phylogenetic 
analyses may be confounded by inconsistencies of the coa- 
lescent history near the species level (Funk and Omland 2003) 
and by lineage-specific compositional and rate heterogeneity 
at higher hierarchical levels (Sheffield et al. 2009; Bernt et al. 
2013; Cameron 2014). In addition, the fact that mitochondrial 
DNA (mtDNA) is present in multiple copies per cell, facilitat- 
ing its amplification and sequencing, has undoubtedly con- 
tributed to the wide use of mitochondrial markers in 
phylogeny reconstruction. However, in spite of these advan- 
tages, complete mitogenome sequencing has been compara- 
tively labor intensive and costly, resulting in often 
conspicuously few newly generated mitogenomes per study 
(e.g., 17 bird mitogenomes in Pacheco et al. [2011], four com- 
plete Cnidarian mitogenomes in Kayal et al. [2013], and 
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1 cockroach and 13 termite mitogenomes in Cameron et al. 
[2012]). Techniques have almost always included either shot- 
gun sequencing of expensive multiple-indexed libraries 
(Botero-Castro et al. 2013) or a target-enrichment step, 
such as primer walking using standard polymerase chain re- 
action (PCR) amplification of overlapping fragments (Botero- 
Castro et al. 2013), long-range PCR followed by either 
sequencing-primer walking (Roos et al. 2007) or shot-gun 
sequencing (Timmermans et al. 2010), and hybrid-capture 
using sheared long-range PCR products as "baits" immobi- 
lized on magnetic beads (Winkelmann et al. 2013). Although 
these techniques can generate full mitochondrial genomes, 
each of them has limitations that generally restrain the 
number of taxa or samples that can be incorporated econom- 
ically within a study. 

This study aims to address this sampling bottleneck by 
testing the possibility of parallel de novo mitogenome assem- 
bly from a single library of pooled genomic DNA from a bulk 
sample consisting of many species. This method has recently 
been applied to sequencing of environmental samples of ar- 
thropods from a rainforest canopy (Crampton-Platt AL, 
Timmermans MJTN, Gimmel ML, Kutty SN, Cockerill TD, 
Khen CV, Vogler AP, unpublished data). Here, we apply this 
technique to investigate the higher level phylogeny of an ex- 
tremely diverse superfamily of insects, the weevils 
(Coleoptera: Curculionoidea). Mitogenome sequences in 
the Coleoptera have to date been accumulated gradually 
for major lineages, including the four suborders, mostly 
using Sanger sequencing (Sheffield et al. 2008, 2009; Pons 
et al. 2010; Song et al. 2010; Timmermans et al. 2010). 
These studies consistently encountered difficulties in resolv- 
ing basal relationships in Coleoptera due to apparent com- 
positional heterogeneity (Sheffield et al. 2009; Song et al. 2010) 
and markedly different rates of molecular evolution (Pons 
et al. 2010). However, it is not known whether heterogeneity, 
that confounds deep-level divergences, also affects subclades, 
for example, at the level of superfamilies and families 
(Cameron 2014). In addition, the effect of different data par- 
titioning schemes remains to be investigated across taxo- 
nomic levels (Cameron 2014). 

The Curculionoidea are composed of no fewer than 62,000 
described species distributed wherever terrestrial plants grow 
(Oberprieler et al. 2007). The current higher level classification 
proposed by Bouchard et al. (2011) recognizes nine extant 
families, among which the Curculionidae s. str. is by far the 
largest, containing at least 51,000 species in 17 subfamilies and 
292 tribes and subtribes. The phylogenetic classification of the 
weevils was recognized by the eminent beetle taxonomist 
Crowson (1955) as ". . . probably the largest and most impor- 
tant problem in the higher classification of Coleoptera " 

Since that time, there have been considerable advances in our 
understanding of the phylogeny of this group, with significant 
morphological analyses by Kuschel (1995) and Marvaldi 
(1997). More recently, molecular data have contributed to- 
ward reconstructing weevil higher level relationships, includ- 
ing studies by McKenna et al. (2009), Hundsdoerfer et al. 
(2009), and Jordal et al. (2011), which each incorporated be- 
tween two and six gene markers. A recent analysis of 27 weevil 



mitogenomes using 12 protein-coding genes (Haran et al. 
2013) supported the paraphyly of Curculionoidea s. str. as 
currently defined, because the subfamily Platypodinae was 
recovered in a distant position in a clade with the families 
Dryophthoridae and Brachyceridae that together were sister 
to all other Curculionoidea. Although undertaken with lim- 
ited taxon sampling within the Curculionoidea s. str. (18 
tribes), this last study also supported the division of the 
family into two large clades: One comprising the "broad- 
nosed" weevils (subfamilies Entiminae, Cyclominae, and 
Hyperinae) and another containing the remaining subfamilies 
(except for Platypodinae). In the same study, a tRNA Ala to 
tRNA Arg gene order rearrangement was identified in a cluster 
of six tRNA genes, located between nad3 and nadS, which 
appears to be a synapomorphy for the "broad-nosed" weevil 
subfamilies, further supporting their monophyly. This topol- 
ogy was consistent with that proposed by McKenna et al. 
(2009), who concluded that the initial diversification of wee- 
vils occurred on gymnosperm plants during the Early to early 
Middle Jurassic. 

The Platypodinae is one of several weevil subfamilies that 
are specialist wood-borers, together with the bark-beetles 
(Scolytinae) and the subfamily Cossoninae, although other 
subfamilies also contain xylophagous members (e.g., 
Molytinae, Cryptorhynchinae, and Conoderinae). The evolu- 
tion of wood-boring behavior was investigated in detail by 
Jordal et al. (2011), whose analyses incorporated morpholog- 
ical characters together with molecular data, concluding that 
both Scolytinae and Platypodinae are derived lineages within 
the Curculionoidea sensu Oberprieler et al. (2007). However, 
several important head characters that underpin this relation- 
ship are likely to be homoplasious and associated with tun- 
neling habit (Jordal et al. 2011). Thompson (1992) identified 
distinct characters of the platypodine eighth abdominal ster- 
nite and male genitalia, which indicated a distant relationship 
to Scolytinae and a possible justification for their inclusion in 
a separate curculionoid family. Therefore, the question about 
the polyphyly of wood-boring lineages remains open, and the 
failure of previous mitogenome studies to recover the platy- 
podine and scolytine lineages as monophyletic (Haran et al. 
2013) may be due to limited taxon sampling. The issue there- 
fore may only be resolved if Jordal et al.'s (2011) comprehen- 
sive taxon sampling of wood-boring lineages could be 
matched using mitochondrial genomes. 

Results 

Mitogenomic Assembly 

Specimens were selected to represent a wide taxonomic cov- 
erage, and included 173 species from six different families of 
Curculionoidea, and 16 subfamilies and 104 tribes of 
Curculionidae. They were acquired from various sources 
and in different stages of preservation, leading to variable 
DNA quality, as is common in phylogenetic studies that in- 
volve lineages for which DNA-ready material is difficult to 
obtain. Individual DNA extracts were not characterized in 
great detail, but based on bait PCR success, they are likely 
to differ in the degree of degradation and purity. All DNA 
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extracts were included in a single sequencing pool at equi- 
molar concentrations, although for several, including aliquots 
from 31 specimens already extracted for a previous study 
(Jordal et al. 2011), the available amount of DNA fell short. 
Following sequencing with an lllumina MiSeq, approximately 
5% of the reads resembled mitochondrial sequences after 
BLAST filtering (from a total of 18,341,901 paired-end reads 
obtained in a single MiSeq run). Assemblies constructed with 
the Celera and IDBA-UD assemblers resulted in 338 and 336 
assemblies of more than 1,000 bp, respectively, rising to 361 
assemblies when combined using MinimusZ Of these, 105 
were more than 10 kb in length and potentially represented 
(largely) complete mitogenomes. The cumulative distribution 
of the assemblies by sequence length is shown in figure 1, 
whereas figure 2 represents the frequency distribution of as- 
sembly lengths for each of the Celera, IDBA-UD, and 
AAinimus2 assemblies. The latter produced a shift toward 
longer contigs, especially for the critical contig length of 
more than 15 kb that corresponds to the full length of 
insect mitogenomes. All subsequent analyses were conducted 
on the Minimus2 assemblies. We were able to newly assemble 
and identify a total of 92 complete or near-complete mito- 
genomes comprising at least eight genes, including 75 (43% of 
all pooled samples) containing the full complement of 15 
genes, a further 15 (8.7% of pooled samples) containing 
more than or equal to 12 genes (supplementary table S1, 
Supplementary Material online), and two assemblies contain- 
ing eight and nine genes, respectively. Those falling short of a 
full-gene complement were mainly lacking the ribosomal 
RNA (rRNA) genes, in particular rrnS, which was the least 
common gene, present in only 56 of the assemblies, whereas 
nad6 and cytB were present in all 92 assemblies. A majority of 
86 assemblies contained a portion of the noncoding control 
region, whose exact length is difficult to ascertain because of 
reduced sequence complexity due to the presence of re- 
peated regions. The mean estimated length of the control 
region was 1,190 bp, whereas in those 33 mitogenomes that 
could be circularized, the length varied between 
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Fig. 1. Cumulative distribution of assembly lengths from the Celera, 
IDBA-UD, and the combined Minimus2-generated assemblies. 



approximately 200-2,780 bp (supplementary table S1, 
Supplementary Material online). 

Identification of Mitogenomic Assemblies Using "Bait" 
Sequences 

From the set of 361 partial and complete contigs obtained 
with Minimus2, a total of 163 cox 7 (529-1,560 bp), 154 cytB 
(218-1,147 bp), and 162 rrnL (211-1,340 bp) gene sequences 
were extracted. Sequences from each gene were grouped into 
libraries and used as queries in a BLAST search against each 
corresponding bait sequence reference library. The latter was 
composed of all successful PCR-based sequences from the 173 
original DNA extractions and included 84 coxl-5 ; , 1 15 coxhl, 
132 cytB, and 107 rrnL sequences (fig. 3). All samples used in 
the bulk sequencing were represented by at least one bait (36 
samples), whereas 42, 57, and 36 samples were represented by 
two, three, and four bait sequences, respectively. Matching 
these bait sequences to the 92 long mitogenomic assemblies, 
16 assemblies showed a match to one bait, 30 assemblies 
matched two baits, 32 assemblies matched three baits, and 
14 assemblies matched all four baits. Four of the complete 
and near-complete mitogenomes contained sequences from 
two nonoverlapping assemblies that each matched at least 
one bait from the same specimen. Out of the remaining 81 
weevil samples, there were 37 instances where baits hit a short 
contig that was not included in the collection of near-com- 
plete or complete mitogenome assemblies, but in 44 in- 
stances, the baits did not hit any of the assembled contigs. 
Additionally, one divergent assembly was rejected because it 
was found to match Coleoptera other than weevils in the 
reference database, possibly present in the sample due to a 
contamination. Supplementary table S2, Supplementary 
Material online, summarizes the bait-matching identification 
results, by bait, for each pooled sample, with matching contigs 
given by their unique number and with reasons for identifi- 
cation failures listed. Overall, the different baits contributed 
fairly equally to the final identifications, with 56% of all cox7-3 ; 
baits leading to a successful identification, 53% of cytB, 50% of 
rrnL, and 45% of coxlS'. Proportions of total number of baits, 
bait hits, and hits leading to assembly identifications by gene 
are illustrated in figure 3. A further 50 short contigs (1,025- 
6,437 bp, mean 2,472 bp) matched single baits but were not 
incorporated in the analyses because they contained only a 
maximum of four complete protein-coding or rRNA genes 
each. Their inclusion would have considerably increased the 
amount of missing data in the matrix. 

The total number of reads making up each of the 92 
mitogenomes (which were made up of 96 separate contigs) 
was used to calculate the sequencing depth (fig. 4). The ma- 
jority of sequences showed a 10-50x coverage that generally 
resulted in contigs of 15-20 kb. Coverage reached over 200 x 
in a few cases, but this did not appear to closely correlate with 
contig length. For example, two contigs of high coverage were 
less than 5 kb in length and corresponded to two noncontig- 
uous fragments from the same species (Dryocoetes autogra- 
phus) linked by multiple baits obtained from a single 
specimen. In addition, read coverage was not closely 
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correlated with the initial DNA concentration in the sequenc- 
ing pool. Most samples were present at 10 ng, yet their cov- 
erage varied by more than an order of magnitude, whereas 
coverage for samples present at a concentration up to 4 x 
lower varied over the same range (fig. 4). Twenty-one of the 
31 nonassayed genomic samples resulted in assemblies of 
more than or equal to eight genes (of which 17 assemblies 
contained all 15 genes). We found no taxonomic correlate 



A 25000 

20000 

|p 15000 

1 10000 

o 

u 

5000 
0 

B 10 

7.5 

< 
■z. 

Q g; 
M ° 

c 

2.5 



50 



100 150 200 
Coverage 



250 



300 



10 100 
Coverage 



1000 



Fig. 4. Mean sequencing coverage versus (A) assembly (con tig) length 
(bp) and (B) approximate mass of genomic DNA in the sample pool, for 
identified mitogenomic assemblies. Thirty-one samples that were not 
assayed for DNA concentration are shown at bottom of graph B. 



2226 



Phylogeny of Weevils (Coleoptera: Curculionoidea) • doi:10.1093/molbev/msu154 



MBE 



Table 1. Partitioning Schemes and Nucleotide Substitution Models Selected by Partition Finder for Two Data Sets, According to Gene, and to 
Codon Position (Numbered 1-3) in Protein-Coding Genes. 

Partition Nad2 coxl cox2 atp8 atp6 cox3 nad3 nadS nad4 nad4L nad6 cytB nad1 rml rrnS 
1231231231231231231231231231 23123123123 

All genes 



P1 X X X X XX 

P2XXX XXXXXXXXX 
P3XXXXXXX XX 
P4 X X X X X 

P5 X X X X 

P6 X X X X 

Only protein-coding genes 

P1X X X XXX X X XX 

P2XXX XXXXXXXXX 

P3XXXXXXX XX 

P4 X X X X 

P5 X X X X 

Note. — Reverse strand transcribed genes are indicated in light gray and the rRNA genes in dark gray. Separate partitions are numbered P1-P6 and allocated positions to each 



partition labeled X. 



with sequencing or assembly failure because representatives 
of all six pooled families and 13 of the 16 included subfamilies 
of Curculionidae resulted in long assemblies (the three miss- 
ing subfamilies were represented only by a total of five speci- 
mens). Specimen size is also unlikely to be the dominant 
limiting factor in determining sequencing success because 
many of the small-sized (~2-5mm) Scolytinae produced 
full assemblies. 

Phylogenetic Analyses 

The 92 new assemblies were combined with existing data, for 
an aligned data matrix of 122 samples and 13,792 positions. 
Of the final set of mitogenomes, 2 belonged to the family 
Anthribidae, 5 to Attelabidae, 3 to Brachyceridae, 4 to 
Brentidae, 4 to Dryophthoridae, 1 to Nemonychidae, and 
101 belonged to 67 identified tribes within the 
Curculionidae, including 19 tribes of the wood-boring 
Scolytinae. The optimal partitioning scheme was established 
using Partition Finder, starting with a total of 39 partitions (41 
partitions with the two rRNA genes included) that split all 13 
genes (15 in data sets A, C, and E) and three codon positions 
in each protein-coding gene. PartitionFinder selected five par- 
titions for the "only protein-coding genes" data set and six 
partitions for the "all genes" data set, whereby the two rRNA 
genes were grouped with the first codon positions of nad2, 
nad3, and nad6 and the second codon position of atp8 
(table 1). For both data sets, the first and third codon posi- 
tions on forward and reverse strands were split into separate 
partitions, whereas all second positions were collapsed into a 
single partition. Forward and reverse genes mainly differed in 
base frequencies, with a shift from A to T and G to C in the 
reverse strand partitions, and rates shifted accordingly (nor- 
malized to the time-reversible G-T changes; supplementary 
fig. S3, Supplementary Material online). The data set contain- 
ing "only protein-coding genes R-Y coded" resulted in only 
two partitions, separating first and second codon position for 



both strands combined (third positions are removed from 
this data set). The findings are in accordance with previous 
observations on Curculionoidea that also showed a great im- 
provement in likelihood values when partitioning by both 
codon position and strand (Haran et al. 2013), reflecting 
the great differences in codon usage in genes coded on 
either strand (also see Pons et al. 2010). However, this does 
not extend to produce differences in variation in amino acid 
changes, as forward and reverse strands were consistently 
grouped into a single partition for the data set using second 
position only and for the R-Y-coded matrix (eliminating first 
codon synonymous changes). 

The maximum-likelihood (ML) trees were greatly im- 
proved using six partitions over an unpartitioned analysis, 
but the benefit of using a model with 41 or 39 separate 
partitions was low, as seen from the small additional improve- 
ment in the Akaike information criterion (AIC) values 
(table 2). Interestingly, the improvement in ML from using 
the partitioned models was very similar whether the trees 
were obtained directly under the partitioned model or ob- 
tained under the unpartitioned model but with the likelihood 
calculated under partitioning (table 2). Hence, despite the 
greatly improved likelihood scores after partitioning, the re- 
sulting trees differ only slightly in parameters of greatest 
impact on the likelihood. Indeed, the topologies are little 
changed between searches using the unpartitioned model, 
6-partition model (5-partition model without rRNA genes), 
and the 41 (39) partition model, and hence, there was only a 
small increase in likelihood if the simpler model is imposed on 
the tree obtained with the more complex model. 

ML trees obtained with the various coding schemes 
(including or excluding rRNA genes, R-Y coding, presence 
of third codon position; supplementary table S4, 
Supplementary Material online) also resulted in highly con- 
gruent topologies based on strongly supported (>80% boot- 
strap analysis [BS]) nodes. Figure 5 depicts the best RAxML 
tree obtained with the "all genes" data set under six partitions. 
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Table 2. ML of Trees under Different Partitioning Schemes. 



Data Set 


Partitioning Scheme 


Topological 


Number of 


Substitution 


Number of 


Ln L 


AIC 


AAIC 






Constraint 


Partitions 


Model 


Parameters 








All genes 


Unpartitioned (one partition) 


None 


1 


GTR 


8 


-787,773 


1,575,562 


62,885 




Partition Finder (six partitions) 


On one partition tree 


6 


GTR 


48 


-758,061 


1,516,219 


3,349 




Gene/codon-position (41 partitions) 


On one partition tree 


41 


GTR 


328 


-756,379 


1,513,414 


737 




Gene/codon-position (41 partitions) 


On six partition tree 


41 


GTR 


328 


-756,272 


1,513,199 


522 




Partition Finder (six partitions) 


On 41 partition tree 


6 


GTR 


48 


-758,010 


1,516,116 


3,439 




Gene/codon-position (41 partitions) 


None 


41 


GTR 


328 


-756,010 


1,512,677 


n/a 




Partition Finder (six partitions) 


On one partition tree 


6 


GTR 


48 


-758,061 


1,516,219 


3,542 


Protein-coding 


Unpartitioned (one partition) 


None 


1 


GTR 


8 


-684,161 


1,368,339 


34,473 


genes 


Gene/codon-position (39 partitions) 


On 1 partition tree 


39 


GTR 


312 


-666,834 


1,334,219 


425 




Partition Finder (5 partitions) 


None 


5 


GTR 


40 


-668,480 


1,337,039 


3,173 




Gene/codon-position (39 partitions) 


On five partition tree 


39 


GTR 


312 


-666,678 


1,333,981 


115 




PartitionFinder (five partitions) 


On 39 partition tree 


5 


GTR 


40 


-668,523 


1,337,127 


3,261 




Gene/codon-position (39 partitions) 


None 


39 


GTR 


312 


-666,621 


1,333,866 


n/a 




PartitionFinder (five partitions) 


On one partition tree 


5 


GTR 


40 


-668,567 


1,337,213 


3,347 



Note. — Trees were obtained under no partitioning under the six- or five-partition schemes selected by PartitionFinder, and by the maximum number of partitions tested 
(partitioning by gene and codon position). Each of the resulting trees was then assessed for their likelihood under the alternative models. Note the comparatively small difference 
in likelihood (AAIC) under each partitioning scheme regardless of the model used in the tree search. 



Indicated on this tree are nodes that are retained in the strict 
consensus of trees obtained from all different treatments of 
the data, and those nodes unresolved in the strict consensus, 
that is, the nodes whose resolution is consistent with the 
strict consensus. Nodes with high nodal support (80-100% 
BS) occurred throughout the entire span of nodal ages, and 
this pattern is found across all analyses (supplementary fig. S5, 
Supplementary Material online). Results obtained from the 
three additional smaller subsets of data indicate that the trees 
obtained using the plus- and minus-strand-encoded subsets 
of genes (supplementary figs. S8 and S9, Supplementary 
Material online) agree well with the full matrix-derived 
trees, but importantly, those constructed using only the "bait" 
sequences (supplementary fig. S6, Supplementary Material 
online) contain much lower nodal support than any of the 
mitogenomic trees. This is expected from a data matrix that 
has much missing data, which consequently does not allow 
for robust inference of relationships. 

The data set also allowed us to address the question about 
the hierarchical level at which the confounding effects of 
compositional heterogeneity may be encountered (Sheffield 
et al. 2009; Song et al. 2010). The % 2 test of base heterogeneity 
(Swofford 2002) revealed that with only one exception (atp8) 
the data are heterogeneous by this test (supplementary table 
S7, Supplementary Material online). In contrast, the R-Y 
recoded data, stripped for third positions, indicated that 
most genes are homogeneous by this test, although not for 
the concatenated complete matrix. However, the more de- 
fensible test of Foster (2004) showed that only cox3, cytb, and 
nadl are homogenous in composition. Hence, the issues of 
heterogeneity persist at a much lower hierarchical level than 
the subordinal and superfamily-level relationships investi- 
gated previously (Sheffield et al. 2009; Song et al. 2010). 

Family- Level Relationships 

All 15 analyses recovered the monophyletic "ambrosia bee- 
tles," Platypodinae (100% BS) outside the other "true weevils" 



(=Curculionidae sensu Bouchard et al. 2011), which would 
otherwise be monophyletic. In most analyses, except those 
including R-Y-coded protein-coding genes, Platypodinae was 
placed in the sister clade to the rest of Curculionidae, together 
with the Dryophthoridae (palm weevils) and the brachycerid 
genus Ocladius, with moderate to strong support for 
this adelphic relationship (62-95% BS). In all analyses, the 
monophyletic Brentidae (100% BS) were recovered as the 
sister taxon to a Curculionidae + Dryophthoridae + 
Brachyceridae clade with very strong nodal support (100% 
BS). The sister relationship between the monophyletic 
(100% BS) Attelabidae (leaf-rolling weevils) and this latter 
clade plus Brentidae was similarly very strongly supported 
(100% BS) across all analyses. The Nemonychidae was consis- 
tently recovered as sister to the clade containing Attelabidae 
and all other weevil families mentioned so far. Support for this 
relationship was very high, ranging from 98% to 100% 
BS across analyses. The two taxa belonging to the 
Anthribidae were always recovered as monophyletic (100% 
BS). Within the Attelabidae, the subfamilies Apoderinae and 
Rhynchitinae were recovered as monophyletic with BS sup- 
port of 100% and 83-97%, respectively, across analyses. 

Relationships within Curculionidae s. str. 
In most analyses, the subfamily Bagoinae, represented only by 
a single Bagous, was recovered as the sister to all other 
Curculionidae (excepting Platypodinae as noted above), 
with BS support between 66% and 91%. Similarly, most anal- 
yses resulted in the recovery of both a monophyletic 
Entiminae + Cyclominae + Hyperinae clade (marked A in 
fig. 5; 100% BS) and a strongly supported sister relationship 
between this clade and a second clade (marked B in fig. 5) 
containing all other Curculionidae subfamilies (100% BS). 
Within the entimine clade, the Entiminae itself is not recov- 
ered as monophyletic because the tribe Sitonini is consistently 
recovered (100% BS) either as sister to the clade containing 
Hyperinae + Cyclominae + the rest of Entiminae or in a 
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sister clade also containing the Hyperinae (with generally 
weak nodal support for this relationship). Three entimine 
tribes are consistently recovered as monophyletic, with 
strong nodal support; the Otiorhynchini (100% BS), 
Brachyderini (100% BS), and the Naupactini (100% BS). The 
tribe Tropiphorini is apparently paraphyletic because a well- 
supported clade (95% BS), containing two monophyletic 
Australian members (Catasarcus and Leptopius), is itself 
sister to the Naupactini with strong support (96% BS) and 
is only distantly related to the other Tropiphorini species in 
the data set (Tropiphorus), which is sister to the 
Otiorhynchini with strong nodal support (100% BS). All 
Entiminae (except Sitona) are marked by an ARNSEF to 
RANSEF rearrangement in the tRNA cluster, discovered in 
earlier studies (Song et al. 2010; Haran et al. 2013) and cor- 
roborated here (fig. 5). One taxon, Dichotrachelus manueli, 
classified in Cyclominae by Alonso-Zarazaga and Lyal (1999), 
also possesses this same rearrangement, whereas the remain- 
ing Cyclominae taxa possess the common gene order, 
ARNSEF. Sitona and Hypera were characterized by unique 
RNSAEF and REANSF gene orders, respectively, first observed 
by Haran et al. (2013) and hypothesized to constitute an 
initial step in the evolution of the derived gene order of the 
Entiminae. Here, Hypera + Sitona form a clade that is sister 
to all others in clade A, whereas the Cyclominae (minus 
Dichotrachelus), not represented in Haran et al. (2013), and 
exhibiting the ancestral gene order, occupy the next node as 
sister to the remaining Entiminae characterized by the derived 
gene order. This demonstrates that the gene order changes in 
Hypera and Sitona are independent of those in Entiminae. 

Within the second main curculionid clade, the scolytine 
taxon Coptonotus (Coptonotini) is never recovered together 
with the bulk of the scolytines, which except for Scolytini 
(monophyletic with 100% BS), are consistently recovered in 
a clade with moderate to high support values of 66-100%. 
The scolytine tribes Corthylini and Ipini are always recovered 
as monophyletic (100% BS support) within this. The following 
higher level taxa from the second main Curculionidae clade 
are recovered as monophyletic across all analyses (BS sup- 
ports follow taxon name): Ceutorhynchinae (100%), Lixinae 
(100%), Conoderinae Lobotrachelini (100%>)> and 
Curculioninae Cionini (100%). The Cryptorhynchini appears 
to be paraphyletic owing to the presence of a sample 
(Cryptorhynchini sp. from Cameroon) falling outside the 
well-supported clade (98% BS) comprising all four other 
genera analyzed. 

Discussion 

Contig Formation from Pooled Total DNA 
Sequencing 

Our results provide a clear demonstration of economic, effi- 
cient and reliable sequencing, assembly, and identification of 
large numbers of mitogenomes from a pool of total DNA of 
numerous samples, without any enrichment or PCR amplifi- 
cation. We obtained a complete or near-complete set of pro- 
tein-coding genes for well over 50% of all samples attempted. 
Other recent papers attempting to generate full 



mitochondrial genomes from total DNA either generated a 
separate library for each taxon (Williams et al. 2014) or pooled 
only a small number of distantly related taxa (Rubinstein et al. 
2013). We have been able to employ the resulting sequence 
data to reconstruct a higher level phylogeny of the superfam- 
ily Curculionoidea that is highly congruent with recent mo- 
lecular phylogenies and provides additional evidence for the 
convergent evolution of specialized wood-boring behavior 
and morphology in weevils. The method has been explored 
previously for the analysis of bulk insect samples from a forest 
canopy (Crampton-Platt AL, Timmermans MJTN, Gimmel 
ML, Kutty SN, Cockerill TD, Khen CV, Vogler AP, unpublished 
data), applied to nearly 500 individuals from more than 200 
species. They found that the assembly of mitogenomes from 
bulk samples is hampered by substantial differences in DNA 
concentration for species in the pool, due to variation in both 
body size and number of specimens representing a species. In 
addition, intraspecifk variation was found to cause difficulties 
with assembly due to polymorphisms, mirroring the well- 
known problem with genome assembly from heterozygotes 
(e.g., Langley et al. 2011). The design of this study was ex- 
pected to avoid these problems by normalizing the DNA 
concentration in the pool and by selecting a single individual 
per species. However, we find that there is no close correlation 
of sequencing depth and assembly success (fig. 4), in accor- 
dance with Crampton-Platt AL, Timmermans MJTN, Gimmel 
ML, Kutty SN, Cockerill TD, Khen CV, and Vogler AP (unpub- 
lished data). Our study excludes the presence of intraspecifk 
variation but indicates that there is a sequencing depth at 
which assemblers no longer operate optimally, possibly due to 
the larger numbers of individual sequencing errors contrib- 
uted by overlapping reads. 

A concern of pooled assemblies is the formation of chi- 
meras by the miss-assembly of different mitogenomes. The 
potential for this is expected to increase if closely related 
samples that may not differ in conserved regions of the mito- 
genomes are included in the pool. The prevalence of chimeras 
was tested using 77 taxa for which multiple baits were avail- 
able. In many cases, these tests involved both the cytb or rrnL 
and the two fragments of the coxl gene that map to distant 
positions in the mitogenome. We did not observe a single 
case of chimera formation. In addition, the tree topology gave 
no reason to suggest chimeras, because of the monophyly of 
the smaller families of Curculionoidea, whereas chimera for- 
mation would also have produced great differences in the 
length of terminal branches, which were not observed. 

Phylogenetic Analysis from Densely Sampled 
Mitogenomes 

Together with existing mitogenome sequences, a total of 120 
terminals were included in the phylogenetic analysis. As mito- 
genome data sets increase with the numbers of taxa needed 
for dense sampling, this may produce problems with tree 
searches and model choice. Specifically, the most complex 
models, such as the amino acid-based CAT model used by 
Timmermans et al. (2010) that was required for resolving the 
deep-level relationships within the Coleoptera, are not 
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practical when the number of taxa becomes larger. This raises 
the question of what is the value of using complex models. 
Haran et al. (2013) have shown that likelihood trees of weevils 
can be substantially improved under model partitioning ac- 
cording to 1) codon position and 2) forward versus reverse 
strand, the latter presumably due to the well-established dif- 
ferences in codon usage on either strand. We conducted a 
formal analysis to test whether this partitioning scheme by 
strand and codon captures the most important aspects of the 
nucleotide variation using the PartitionFinder software, start- 
ing from 41 potential partitions of each codon position within 
each gene. This could be reduced to the codon positions for 
all genes on either strands, similar to Haran et al. (2013), but 
maintaining a single partition for the second codon position 
on either strand, while adding a separate partition for the 
rRNA genes not included in that study. The use of these six 
partitions over the full set of 41 partitions led only to a small 
reduction in likelihood, whereas the unpartitioned models 
were substantially worse (table 2). 

A general difficulty for comparing models is that compar- 
isons are only possible for a single topology but searches 
under different partitions favor different topologies. We 
therefore used the optimal trees obtained under no partition- 
ing and the 6- and 41 -partition schemes to assess likelihoods 
of the alternative partitioning schemes on those three topol- 
ogies. The likelihoods on all trees for the three models were 
almost identical (table 2), indicating that tree topology is not 
a major deciding factor for the best model. Taken at face 
value, the 41 partition wins out over the 6 partition scheme 
in all three analyses, but the likelihood gain is minor. As like- 
lihood values become very large with the use of numerous 
whole mitogenomes, AIC values may not be an appropriate 
approach to avoid overparameterization, unless they are nor- 
malized for the total likelihood values (Castoe et al. 2005). We 
therefore believe the 6-partition scheme is fully adequate. In 
addition, the practicalities of tree searches on increasingly 
large data sets from full mitogenomes, as generated with 
the proposed methodology, also strongly argue for parameter 
reduction. 

Trees obtained from analysis of full mitogenomes were the 
most robust, but those obtained using the subsets of protein- 
coding genes resulted in good topological approximations to 
them (supplementary figs. S8 and S9, Supplementary Material 
online), suggesting that phylogenetic signal is largely uniform 
across genes, and is strengthened with additional data. This 
can be seen by the recovery of certain monophyletic groups 
such as the Cyclominae only possible with the full matrix. 
However, trees constructed from the "bait" sequences alone 
were the least robust, due to both the reduced information 
content (comparable to the reverse strand genes) and to 
considerable missing data. 

Implications for the Systematics of Weevils 
The close relationship linking Platypodinae with 
Dryophthoridae, as sister to the Curculionidae s. str., has 
been demonstrated multiple times (Marvaldi 1997; 
McKenna et al. 2009; Haran et al. 2013) and indicates that 



the family Curculionidae, as presently classified, is paraphy- 
letic. The simplified classification system proposed by 
Oberprieler et al. (2007), recognizing a broader 
Curculionidae also containing the presently defined 
Brachyceridae and Dryophthoridae as respective subfamilies 
(sensu Alonso-Zarazaga and Lyal 1999) would be consistent 
with our family-level results. Our results strongly support the 
relationships among the curculionoid families at the base of 
the tree, which are consistent with most previous molecular 
analyses, with the exception of the placement of 
Nemonychidae. This family has previously been suggested 
to be split off at the most basal node (e.g., McKenna et al. 
2009), as opposed to Anthribidae in our results, but our sam- 
pling lacks two of the "primitive" weevil families (Belidae and 
Caridae), prohibiting a definitive conclusion. Our results 
are also consistent with the previously suggested hypothe- 
sis that the Brentidae are the sister family to all the "true 
weevils," Curculionidae, if we include Brachyceridae and 
Dryophthoridae in the latter. 

A previously described deep split within the true weevils 
was confirmed by our substantially increased sampling. 
One strongly supported clade contains the 
Entiminae + Cyclominae + Hyperinae and represents the 
monophyletic and diverse "broad-nosed" weevils, so named 
because of their relatively short and blunt rostrums. 
Rearrangements within the cluster of six tRNA genes are re- 
stricted to this clade, even with our increased taxon coverage, 
further supporting its distinctiveness. The cyclomine genus 
Dichotrachelus, containing the same RANSEF rearrangement 
as all other Entiminae (except Sitona) in our analysis, has been 
treated as belonging to the Entiminae by some authors 
(Meregalli and Osella 2007) on morphological grounds. 
Combined with the low nodal support for its inclusion in a 
monophyletic Cyclominae (<50% BS), our tRNA rearrange- 
ment data are consistent with this opinion. The second clade 
containing all other curculionoid subfamilies, with the excep- 
tion of Bagoinae, which is placed outside of the two main 
clades, is much less satisfactorily resolved, with only two of its 
constituent subfamilies (Lixinae and Ceutorhynchinae) being 
monophyletic. It contains a number of very large subfamilies 
including the Curculioninae, Molytinae, Baridinae, 
Cryptorhynchinae, and Conoderinae, whose relationships 
remain obscure due to a lack of strong nodal support. 
Although the recovery of two tribes within this group being 
monophyletic (Lobotrachelini and Cionini) is encouraging, to 
further investigate the confusing topology of this clade, sig- 
nificantly more representative taxon sampling will be re- 
quired. Indeed, limitations in taxon sampling are often cited 
as potentially limiting factors in higher level phylogenetics 
(Franz and Engel 2010), and this is certainly an important 
consideration in such a large group as the Curculionoidea. 

An interesting finding is that strong nodal support spans 
the full depth of the tree and differing taxonomic ranks (fam- 
ilies, subfamilies, and tribes; supplementary fig. S5, 
Supplementary Material online). This pattern was seen in 
analyses of all data sets and under all partitioning models. 
A potential criticism of mitochondrial sequence data is that 
due to accelerated evolutionary rates, saturation of sites may 
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obscure or distort phylogenetic signal at deeper nodes 
(Talavera and Vila 2011). It is clear from our data that at 
least at the intrasuperfamily level in weevils, this is not nec- 
essarily the case, with phylogenetic signal being evenly distrib- 
uted across the estimated 170 My diversification history of 
the weevils (McKenna et al. 2009). 

Evolution of Wood-Boring Behavior 
The wood-boring weevil subfamilies are highly adapted to 
excavate galleries, either subcortically or in woody tissue, 
and feed on ligneous matter directly or cultivate symbiotic 
fungi in the tunnels as a food source, and for this reason, 
many are widespread pests of forestry (Oberprieler et al. 
2007). The taxon density of the current analysis nearly 
matched the extensive sampling of the wood-boring groups 
by Jordal et al. (2011), a study that is the basis for suggesting 
their close affinity. However, in contrast to Jordal et al. (2011), 
our results support the conclusions of Haran et al. (2013) and 
McKenna et al. (2009), indicating that wood-boring lineages 
are clearly not monophyletic, with Platypodinae consistently 
retrieved as closely related to the Dryophthoridae (and 
Brachyceridae) in a clade sister to all other Curculionidae 
sensu Bouchard et al. (201 1 ). Although our analyses recovered 
neither the Scolytinae nor the Cossoninae as monophyletic, 
and they were never recovered as sister taxa or nested within 
the same clade, we cannot confidently conclude as to the 
relationship between them because only a series of weakly 
supported nodes separate the cossonine taxa and Coptonotus 
from the rest of the Scolytinae. The latter genus is interesting 
for consistently not being recovered in our analyses within the 
generally well-supported Scolytinae clade (excepting 
Scolytini). Based on morphological characters, Coptonotus 
has been considered to be a transitional taxon between 
Platypodinae and other Curculionidae (Jordal et al. 2011) or 
alternatively as an intermediate form between Cossoninae 
and Scolytinae (Thompson 1992), while also containing mor- 
phological characters linking it with Cossoninae. Thompson 
(1992) has suggested a close relationship between 
Coptonotini and the scolytine tribe Hylastini based on struc- 
tures of the aedeagus. However, our results argue against this 
because the Hylastini sample (Hylastes opacus) was retrieved 
with strong support as the sister of Tomicini, and this clade 
itself was strongly supported as sister to the Hylesini, within 
the main Scolytinae clade. 

Conclusions 

We have demonstrated the relative ease of efficiently and 
economically obtaining a large number of mitogenome 
DNA sequences from a pooled mixture of DNA extracts, 
without the need for enrichment or species-specific tagging 
prior to genome pooling. Mitogenome sequences are confi- 
dently identified to specimen with a limited amount of prior 
mtDNA sequence data for each sample and exhibit no error 
with regard to these bait sequences. Our mtDNA genome 
data yield phylogenetic relationships that are highly congru- 
ent with prior expectations and provide phylogenetic signal 
with robustly supported nodes across a broad range of lineage 



divergence times and taxon diversity, from family level to 
generic level, which are consistent across different data par- 
titioning schemes. 

It is evident that the efficiency of our approach will be a 
function of the relative concentration of mitochondrial to 
nuclear DNA within a focal group. The average coleopteran 
genome size is estimated to be approximately 0.65 Gb ± 0.05 
(http://www.genomesize.com, last accessed May 10, 2014). 
Under the assumption that the copy number of mtDNA 
genomes does not differ substantially across organisms, our 
approach should be of broad utility within insect phyloge- 
netics where mean nuclear genome size is estimated to be 
1.22 Gb ± 0.05. However, it may be less efficient for taxa with 
larger average nuclear genome sizes (e.g., crustaceans: mean 
nuclear genome size = ~4.45 Gb ± 0.45). A further consider- 
ation for the implementation of our approach is taxon sam- 
pling and the mitogenomic assembly pipeline. Our sampling 
for the higher level taxonomic relationships within the 
Curculionoidea provides little challenge for the pipeline, as 
mtDNA genomes sampled from different genera exhibit high 
DNA sequence divergence. Genome divergence facilitates 
genome reassembly from a mixed pool of genome fragments, 
and the pipeline efficiency will eventually be compromised as 
mtDNA genome relatedness increases. Our data suggest that 
this limit lies somewhere below an uncorrected divergence of 
10% for coxl and cytB that characterizes the two species of 
Cionus (C olens and C. griseus) included in our sampling. To 
ascertain genome relatedness thresholds for the reassembly 
pipeline, simulation analyses can be employed. However, it is 
important to point out that as NGS technology and read 
lengths improve, relatedness thresholds will also become 
more favorable. 

Materials and Methods 

Taxon Sampling, DNA Extraction, and Quantification 
Throughout this study, the most recent higher level classifi- 
cation of Curculionoidea, proposed by Bouchard et al. (2011) 
is adhered to, whereas the assignment of genera to higher taxa 
follows the catalog of Alonso-Zarazaga and Lyal (1999). DNA 
was extracted from each ethanol-preserved specimen individ- 
ually using DNeasy blood and tissue extraction kits (Qiagen). 
The concentration of double-stranded DNA (dsDNA) in most 
extractions (139 of 173) was assayed on a Qubit fluorometer 
using a dsDNA high-sensitivity kit (Invitrogen). 



"Bait" Sequence PCR 

Standard PCR reactions to amplify four different fragments of 
mtDNA (coxl 5' "barcode region," coxl 3 ; region, rrnL and 
cytb) were undertaken for each of the 173 samples. Primers 
and reaction conditions are listed in supplementary table S10, 
Supplementary Material online. PCR products were first 
cleaned with a size-exclusion filter (Merck Millipore) and 
then Sanger sequenced; the resulting bait sequences were 
subsequently employed to identify mitogenomic assemblies 
in the manner detailed below. 
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Table 3. List of Software Used for the De Novo Assembly and Analysis of Mitogenomes, with their Main Function and Source URL 



Software 



Function 



URL a 



FastQC 


NGS quality assessment 


http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ 


Trimmomatic 


Adapter trimming 


http://www.usadellab.org/cms/index.php?page=trimmomatic 


Celera 


Genome assembly 


http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main Page 


IDBA-UD 


Genome assembly 


http://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/ 


Minimus2 


Merging sequence sets 


http://sourceforge.net/apps/mediawi ki/amos/index.php?title=Minimus2 


Prinseq 


Sequence quality control 


http://edwards.sdsu.edu/cgi-bin/prinseq/prinseq.cgi 


COVE 


tRNA annotation 


http://selab.janelia.org/software.html 


FeatureExtract 


Gene extraction 


http://www.cbs.dtu.dk/services/FeatureExtract/ 


Geneious 


Gene annotation/sequence editing 


http://www.geneious.com/ 


MAFFT 


Sequence alignment 


h ttp://mafft.cbrc.j p/al ign ment/software/ 


BLAST 


Local alignment search 


http://blast.ncbi.nlm.nih.gov/Blast.cgi 


Partition Finder 


Partitioning scheme selection 


http://www.robertlanfear.com/partitionfinder/ 


CIPRES 


Phylogenetic analysis server 


http://www.phylo.org/ 


RAxML 


Maximum-likelihood phylogenetic analysis 


http://sco.h-its.org/exelixis/software.html 


"APE" package in R 


Phylogenetic analysis 


http://ape-package.i rd.fr/ 



a AII URLs were last accessed on May 10, 2014. 

Sample Pooling and Sequencing 

To minimize the effects of DNA concentration on assembly 
success across all samples, approximately equimolar quanti- 
ties of genomic DNA from each of the samples were pooled, 
aiming for 10 ng of dsDNA per sample, resulting in a DNA 
pool of approximately 1.5 jig. This calculation did not con- 
sider 31 samples which were not quantified because of limited 
sample volume. For each of these, a fixed volume of either 5 or 
8|il was added to the pool. Based on the findings of 
Crampton-Platt AL, Timmermans MJTN, Gimmel ML, Kutty 
SN, Cockerill TD, Khen CV, and Vogler AP (unpublished data), 
where longer insert size was found to result in longer mito- 
chondrial contigs, a TruSeq library was prepared from the 
pool aiming for an insert size of 800 bp. Quantification of 
the final library indicated that the average insert size was 
790 bp, and this was sequenced on a single lllumina MiSeq 
run (500-cycle, 250 bp paired-end reads, version 2 reagent kit). 

Mitogenomic Assembly Pipeline 

The bioinformatics assembly pipeline used in this study was 
developed by Crampton-Platt AL, Timmermans MJTN, 
Gimmel ML, Kutty SN, Cockerill TD, Khen CV, and Vogler 
AP (unpublished data) and is followed here with minor mod- 
ifications. A list of the software required (most freely available) 
is given in table 3 and a schematic overview of the principal 
steps is presented in figure 6. In brief, the raw data were 
trimmed of adapters using Trimmomatic (Bolger et al. 
2014), and putative mitochondrial reads were identified in a 
BLAST search (Altschul et al. 1990) against a custom reference 
database of 258 Coleoptera mitogenomes (E= 1e— 5; no re- 
striction in length overlap) (Timmermans MJTN, Barton C, 
Dodsworth S, Haran J, Ahrens D, Foster PG, Bocak L, and 
Vogler AP, unpublished data). The extracted mtDNA reads 
were subjected to whole-genome shot-gun assembly using 
Celera Assembler (Myers et al. 2000) and IDBA-UD (Peng 
et al. 2012), and the resulting contigs were filtered again for 
mtDNA hits against the Coleoptera reference library for 
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Fig. 6. Schematic flowchart of the principal steps for the bulk de novo 
assembly of mitogenomes and identification with PCR-amplified "bait" 
sequences. 



sequences of more than 1,000 bp overlap at E= 1e— 5. Both 
assemblies were merged using Minimus2 (Sommer et al. 
2007) to combine overlapping sequences from both assem- 
blers into longer scaffolds. 

To investigate the relationship between the number of 
generated sequencing reads and assembly success, all reads 
were mapped onto the obtained contigs using Geneious, al- 
lowing for 2% mismatches, a maximum gap size of 3 bp and 
requiring a minimum overlap of 100 bp. Annotations of each 
assembly were conducted by first mapping tRNA genes with 
COVE (Eddy and Durbin 1994), after which the intervening 
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protein and rRNA coding genes were extracted with 
FeatureExtract 1.2 (Wernersson 2005). To identify these 
genes, the resulting sequences were mapped to the 
Tribolium castaneum mitogenome (GenBank accession 
number NC_003081) using Geneious and were afterward ex- 
ported, by gene, into separate FASTA files. Sequences of less 
than one-third of total gene length were discarded. 

Identification of Mitogenomic Assemblies Using "Bait" 
Sequences 

To identify the mitogenomic assemblies by association with 
their respective originating specimen, BLAST searches were 
conducted for each bait sequence reference against all corre- 
sponding gene sequences extracted from the mitogenome 
assemblies (separately for cox7-5 / and 3 f regions, cytB and 
rrnL). Only hits with 100% pairwise identity and more than 
100 bp overlap were considered a successful identification. 
Where multiple bait sequences from a single specimen 
were available, each bait was checked to have hit the same 
long assembly unequivocally to test for possible chimeras. If 
baits from a single specimen matched multiple, nonoverlap- 
ping assemblies, they presumably correspond to the same 
incompletely assembled mitogenome. These assemblies 
were combined and retained if they included eight or more 
genes in total. Once mitogenomic assemblies were identified, 
the tRNA gene order in the cluster of six tRNA genes located 
between nad3 and nadS was visually recorded. 

Sequence Alignment and Data Set Concatenation 
The sequences for the genes nadS, nad4, nod4L, and nadl, 
which are transcribed on the reverse strand of the mitochon- 
drial genome, were reverse complemented prior to alignment. 
Twenty-eight additional curculionoid mitogenome sequences 
were obtained from GenBank (primarily those generated by 
Haran et al. [2013]; supplementary table S1, Supplementary 
Material online) to maximize taxon sampling. Two members 
of Chrysomeloidea were included as outgroups, following 
Haran et al. (2013). The combined sequences from each of 
the separated 13 protein-coding and two rRNA genes were 
individually aligned using the MAFFT 7.0 online server, under 
the FFT-NS-I slow iterative refinement strategy (Katoh et al. 
2002). Alignments were thereafter checked manually in 
Geneious for quality and to ensure that protein-coding 
genes were in the correct reading frame. Genes were conca- 
tenated together to make six different data matrices as fol- 
lows: All genes (A), only protein-coding genes (B), all genes 
with third codon positions removed from protein-coding 
genes (C), protein-coding genes only with third codon posi- 
tions removed (D), all genes with third codon positions re- 
moved from protein-coding genes and first codon positions 
R-Y coded (E), and only protein-coding genes with third 
codon positions removed and first codon positions R-Y 
coded (F). 

Phylogenetic Analyses 

Each of the six data sets was analyzed under the ML optimal- 
ly criterion using RAxML 7.6.6 (Stamatakis 2006) run on the 



CIPRES web-based server (Miller et al. 2010). To assess nodal 
support, a rapid BS with 1,000 iterations was run in parallel 
with tree-building. The data sets were each analyzed both 
partitioned by gene and unpartitioned (i.e., a single partition). 
Additionally, three of the data sets (A, B, and E) were first 
tested using Partition Finder (Lanfear et al. 2012) to objectively 
select the best-fitting partitioning scheme and model of mo- 
lecular evolution for each alignment. This was performed 
using the Bayesian Information Criterion from an initial 
partitioning of each of the three codon positions for each 
amino acid-coding sequence and each rRNA gene being 
separate partitions. The resulting ML trees were made ultra- 
metric using the chronos function of the APE package in R 
(Paradis et al. 2004), which uses penalized likelihood to fit a 
chronogram to a phylogenetic tree (Paradis 2013). To obtain 
a measure of the suitability of the mitogenomic data to 
robustly support relationships across different nodal ages 
(putative taxonomic ranks), we investigated the distribution 
of nodal support across trees by calculating the branch length 
from the root for each node using a custom R script and 
plotting this against its respective RAxML BS support. We 
also constructed a strict consensus tree from the 15 ML 
trees to visualize the distribution of consistent nodes 
across all our analyses. We performed additional RAxML anal- 
yses on data sets A and B partitioned by gene and separate 
codon positions for each protein-coding gene (41 and 39 
partitions, respectively) and various RAxML analyses 
on these two data sets with different combinations of 
partitioning schemes and topological constraints, as 
summarized in table 2, in order to calculate the AIC as a 
means for preferred model selection (Posada and Buckley 
2004). 

To investigate how successfully subsets of the full-data 
matrix were able to reconstruct the phylogeny, we also ana- 
lyzed (using RAxML with data partitioned by gene and by 
Partition Finder-derived partitions) three additional data sets, 
composed oft 1) only the reverse-tran scribed protein-coding 
genes (nadS, nad4, nod4L, and nadl), 2) the remaining nine 
forward-transcribed protein-coding genes, and 3) only the 
available "bait" sequences. The latter analysis was undertaken 
to ascertain whether there is any benefit in assembling the 
mitogenomes for phylogeny reconstruction, over the PCR 
sequences alone. 

Compositional heterogeneity was assessed on the 
protein-coding genes using the % 2 statistics (Swofford 2002). 
The resulting P value is the probability that the data are ho- 
mogeneous and is considered significant when less than 5%. 
This test suffers from a high probability of Type II error be- 
cause the test assumes independence of the data, which they 
are not. We therefore also used the test of Foster (2004), 
which uses simulations based on the ML tree, model, and 
data size to generate a valid null distribution of a % 2 value 
from the original data. The ML tree for the concatenated data 
was used in all cases when assessing heterogeneity in each 
gene, with any missing taxa pruned off. Model parameters 
and branch lengths were reoptimized under a GTR + G 
model. 
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Supplementary Material 

Supplementary tables S1, S2, S4, S7, and S10 and figures S3, S5, 
S6, S8, and S9 are available at Molecular Biology and Evolution 
online (http://www.mbe.oxfordjournals.org/). 
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